Pointwise mutual information

Pointwise mutual information (PMI), or point mutual information, is a measure of association used in information theory and statistics.

Contents

Definition

The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and the probability of their coincidence given only their individual distributions, assuming independence. Mathematically:


\operatorname{pmi}(x;y) \equiv \log\frac{p(x,y)}{p(x)p(y)} = \log\frac{p(x|y)}{p(x)} = \log\frac{p(y|x)}{p(y)}.

The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes.

The measure is symmetric (pmi(x;y)=pmi(y;x)). It can take positive or negative values, but is zero if X and Y are independent. PMI maximizes when X and Y are perfectly associated, yielding the following bounds:


-\infty \leq \operatorname{pmi}(x;y) \leq \min\left[ -\log p(x), -\log p(y) \right]

Finally, \operatorname{pmi}(x;y) will increase if p(x|y) is fixed but p(x)decreases.

Here is an example to illustrate:

x y p(xy)
0 0 0.1
0 1 0.7
1 0 0.15
1 1 0.05

Using this table we can marginalize to get the following additional table for the individual distributions:

p(x) p(y)
0 .8 0.25
1 .2 0.75

With this example, we can compute four values for pmi(x;y). Using base-2 logarithms:

pmi(x=0;y=0) −1
pmi(x=0;y=1) 0.222392421
pmi(x=1;y=0) 1.584962501
pmi(x=1;y=1) −1.584962501

(For reference \operatorname{I}(X;Y) would then be 0.214170945)

Similarities to Mutual Information

Pointwise Mutual Information has many of the same relationships as the mutual information. In particular,


\begin{align}
\operatorname{pmi}(x;y) &=& h(x) %2B h(y) - h(xy) \\ 
 &=& h(x) - h(x|y) \\ 
 &=& h(y) - h(y|x)
\end{align}

Where h(x) is the self-information, or -\log_2 p(X=x).

Normalized Pointwise mutual information (npmi)

Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.



\operatorname{npmi}(x;y) = \frac{\operatorname{pmi}(x;y)}{-\log \left[ \max ( p(x), p(y) ) \right] }

Chain-rule for pmi

Pointwise mutual information follows the chain-rule, that is,

\operatorname{pmi}(x;yz) = \operatorname{pmi}(x;y) %2B \operatorname{pmi}(x;y|z)

This is easily proven by:


\begin{align}
\operatorname{pmi}(x;y) %2B \operatorname{pmi}(x;y|z) & {} = \log\frac{p(x,y)}{p(x)p(y)} %2B \log\frac{p(x,z|y)}{p(x|y)p(z|y)} \\ 
& {} = \log \left[ \frac{p(x,y)}{p(x)p(y)} \frac{p(x,z|y)}{p(x|y)p(z|y)} \right] \\ 
& {} = \log \frac{p(x|y)p(y)p(x,z|y)}{p(x)p(y)p(x|y)p(z|y)} \\
& {} = \log \frac{p(x,yz)}{p(x)p(yz)} \\
& {} = \operatorname{pmi}(x;yz)
\end{align}

External links

References